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Polymer links for the POLYLINK command completed in REGISTRY 
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New UPM (Update Code Maximum) field for more efficient patent 

SDIs in CAplus 


NEWS 


6 


May 


27 


CAplus super roles and document types searchable in REGISTRY 
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Additional enzyme-catalyzed reactions added to CASREACT 
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ANTE, AQUALINE, BIOENG, CIVILENG, ENVIROENG, MECHENG, 
and WATER from CSA now available on STN(R) 
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BEILSTEIN enhanced with new display and select options^ 
resulting in a closer connection to BABS 
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BEILSTEIN on STN workshop to be held August 24 in conjunction 

with the 228th ACS National Meeting 
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fields 
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CAplus and CA patent records enhanced with European and Japan 
Patent Office Classifications 
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STN User Update to be held August 22 in conjunction with the 
22 8th ACS National Meeting 
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The Analysis Edition of STN Express with Discover! 

(Version 7.01 for Windows) now available 
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Pricing for the Save Answers for SciFinder Wizard within 
STN Express with Discover! will change September 1, 2004 
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JULY 30 CURRENT WINDOWS VERSION IS V7.01, CURRENT 
MACINTOSH VERSION IS V6.0c(ENG) AND V6.0Jc(JP), 
AND CURRENT DISCOVER FILE IS DATED 11 AUGUST 2004 
STN Operating Hours Plus Help Desk Availability 
General Internet Information 
Welcome Banner and News Items 

Direct Dial and Telecommunication Network Access to STN 
CAS World Wide Web Site (general information) 



Enter NEWS followed by the item number or name to see news on that 
specific topic. 

All use of STN is subject to the provisions of the STN Customer 
agreement. Please note that this agreement limits use to scientific 
research. Use for software development or design or implementation 

of commercial gateways or other similar uses is prohibited and may 
result in loss of user privileges and other penalties. 
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FILE 'BIOSIS' ENTERED AT 15:39:40 ON 20 AUG 2004 
COPYRIGHT (C) 2004 BIOLOGICAL ABSTRACTS INC. (R) 
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=> s e3-6 

LI 21 ("TOLDO L"/AU OR "TOLDO L I"/AU OR "TOLDO L I G"/AU OR "TOLDO 

LUCA"/AU) 

=> duplicate remove 11 

DUPLICATE PREFERENCE IS "MEDLINE, BIOSIS' 

KEEP DUPLICATES FROM MORE THAN ONE FILE? Y/ (N) :n 

PROCESSING COMPLETED FOR LI 



L2 18 DUPLICATE REMOVE LI (3 DUPLICATES REMOVED) 

=> d 1-18 bib ab 

L2 ANSWER 1 OF 18 MEDLINE on STN 

Full Text 

AN 2004162614 IN-PROCESS 
DN PubMed ID: 15057406 

TI [Toxicoproteomics : first experiences in a BMBF-study] . 

Toxikoproteomics : Erste Erfahrungen in einer BMBF-Studie. 
AU Kroger Michaela; Hellmann Jurgen; Toldo Luca; Gluckmann Matthias; von 

Eiff Bettina; Fella Kerstin; Kramer Peter-Jurgen 
CS Institut fur Toxikologie, Merck KGaA, D-Darmstadt. . 

michaela . kroeger @merck . com 
SO ALTEX : Alternativen zu Tierexperimenten, (2004) 21 Suppl 3 28-40. 

Journal code: 100953980. ISSN: 0946-7785. 
CY Germany: Germany, Federal Republic of 
DT Journal; Article; (JOURNAL ARTICLE) 
LA German 

FS IN-PROCESS; NONINDEXED; Priority Journals 
ED Entered STN: 20040402 

Last Updated on STN: 20040505 
AB The rapid development of molecular toxicology is providing innovative 



approaches to an improved investigation and recognition of toxic 
substances. Proteome analysis offers, with 2DE/MS (two-dimensional gel 
electrophoresis and mass spectrometry) and SELDI (surface enhanced laser 
desorption/ionisation) , a promising discipline to classify molecular 
changes caused by toxic exposure. The Rat Liver Foci Bioassay (RLFB) is 
detailed, well-described model for the investigation of liver 
carcinogenesis induced by chemical substances. Based on this model, we 
examined whether proteomic methods of molecular toxicology can be used fo 
the early recognition of toxic and/or carcinogenic characteristics of 
toxic substances. In addition, identification and subsequent 
prevalidation of new hepatocellular biomarkers was performed, enabling 
better prediction of toxic and/or carcinogenic effects. This could lead 
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to a more meaningful RLFB and thus to an improved risk assessment of 
chemicals. 2DE analysis in this study showed that deregulated proteins are 
assigned to mainly anabolic and catabolic metabolism pathways in the cell. 
Beyond this, individual proteins were identified which play a key role in 
the carcinogenic process. A comparison of the differentially expressed 
proteins in tissue from tumour-bearing animals and tissue derived from the 
start of the study revealed that protein expression changes (biomarkers) 
were already detectable shortly after exposure. In addition, analysis by 
SELDI clearly showed several differentially expressed proteins and/or 
derived masses. The spectra represented specific differences in tissues, 
which could be assigned to the same histopathological endpoints . With 
bioinf ormatics analysis it was possible to identify individual 
discriminating mass peaks, which were indicative of tumour formation. 
Group specific changes can be illustrated and/or represented in more 
detail with further cluster analysis methods. These results give hope for 
an improved prediction of hepatotoxicity and carcinogenicity by means of 
protein markers, which could in the future lead to a shortening of 
carcinogenicity studies and to a reduction in the use of experimental 
animals . 

L2 ANSWER 2 OF 18 BIOSIS COPYRIGHT 2004 BIOLOGICAL ABSTRACTS INC. on STN 
Full Text 

AN 2003:307420 BIOSIS 
DN PREV2 00300307420 

TI Biomarker identification in toxicology by SELDI. 

AU Knapp, U. [Reprint Author]; Fella, K. [Reprint Author]; von Eiff, B. 

[Reprint Author] ; Toldo, L. ; Hellmann, J. [Reprint Author] ; Kroeger, M. 

[Reprint Author] 
CS Merck KGaA, Institute of Toxicology, Darmstadt, Germany 

SO Naunyn-Schmiedeberg's Archives of Pharmacology, (March 2003) Vol. 367, No. 
Supplement 1, pp. R154. print. 

Meeting Info.: 44th Spring Meeting of the Deutsche Gesellschaft fuer 
Experimentelle und Klinische Pharmakologie und Toxikologie and the 20th 
Meeting of the Gesellschaft fuer Umwelt-Mutationsf orschung . Mainz, 
Germany. March 17-20, 2003. 
ISSN: 0028-1298 (ISSN print). 

DT Conference; (Meeting) 

Conference; Abstract; (Meeting Abstract) 

LA English 

ED Entered STN: 2 Jul 2003 

Last Updated on STN: 2 Jul 2003 

L2 ANSWER 3 OF 18 MEDLINE on STN DUPLICATE 1 

Full Text 

AN 2002322867 MEDLINE 
DN PubMed ID: 12065231 

TI 6-Carboxymethyl genistein: a novel selective oestrogen receptor modulator 
(SERM) with unique, differential effects on the vasculature, bone and 
uterus . 

AU Somjen D; Amir-Zaltsman Y; Gayer B; Kulik T; Knoll E; Stern N; Lu L J W; 
Toldo L; Kohen F 

CS Department of BiologicalRegulation, Weizmann Institute of Science, 

Rehovot, 76100 Israel. 
NC P30 ES 06676 (NIEHS) 

SO Journal of endocrinology, (2002 Jun) 173 (3) 415-27. 

Journal code: 0375363. ISSN: 0022-0795. 
CY England: United Kingdom 
DT Journal; Article; (JOURNAL ARTICLE) 
LA English 
FS Priority Journals 



STN Colurnbus 



EM 200208 

ED Entered STN: 20020615 

Last Updated on STN: 20020810 
Entered Medline: 20020809 

AB The novel genistein (G) derivative, 6-carboxymethyl genistein (CG) was 
evaluated for its biological properties in comparison with G. Both 
compounds showed oestrogenic activity in vitro and in vivo. On the other 
hand G and CG differed in the following parameters: (i) only CG displayed 
mixed agonist-antagonist activity for oestrogen receptor (ER) alpha in 
transactivation assays and (ii) only CG was capable of attenuating 
oestrogen (E ( 2 )) -induced proliferation in vascular smooth muscle cells and 
of inhibiting oestrogen-induced creatine kinase (CK) specific activity in 
rat tissues. On the other hand only G enhanced the stimulatory effect on 
CK specific activity in the uterus. In comparison to the selective 
oestrogen receptor modulator (SERM) raloxifene (RAL) , CG showed the same 
selectivity profile as RAL in blocking the CK response to E(2) in tissues 
derived from both immature and ovariectomized female rats. Molecular 
modelling of CG bound to the ligand binding domain (LED) of ERbeta 
predicts that the 6-carboxymethyl group of CG almost fits the binding 
cavity. On the other hand, molecular modelling of CG bound to the LBD of 
ERalpha suggests that the carboxyl group of CG may perturb the end of 
Helix 11, eliciting a severe backbone change for Leu 525, and consequently 
induces a conformational change which could position Helix 12 in an 
antagonist conformation. This model supports the experimental findings 
that CG can act as a mixed agonist-antagonist when E(2) is bound to its 
receptors. Collectively, our findings suggest that CG can be considered a 
novel SERM with unique effects on the vasculature, bone and uterus. 

L2 ANSWER 4 OF 18 MEDLINE on STN 

Full Text 

AN 2001102271 MEDLINE 
DN PubMed ID: 10977085 

TI An evaluation of ontology exchange languages for bioinf ormatics . 

AU McEntire R; Karp P; 7\bernethy N; Benton D; Helt G; DeJongh M; Kent R; 

Kosky A; Lewis S; Hodnett D; Neumann E; Olken F; Pathak D; Tarczy-Hornoch 

P; Toldo L; Topaloglou T 
CS SmithKline Beecham Pharmaceuticals, King of Prussia, PA 19406, USA.. 

Robin A McEntire@sbphrd.com 
SO Proceedings / ... International Conference on Intelligent Systems for 

Molecular Biology ; ISMB. International Conference on Intelligent Systems 

for Molecular Biology, (2000) 8 239-50. 

Journal code: 9509125. 
CY United States 

DT Journal; Article; (JOURNAL ARTICLE) 

LA English 

FS Priority Journals 

EM 200101 

ED Entered STN: 20010322 

Last Updated on STN: 20010322 
Entered Medline: 20010126 

AB Ontologies are specifications of the concepts in a given field, and of the 
relationships among those concepts. The development of ontologies for 
molecular-biology information and the sharing of those ontologies within 
the bioinformatics community are central problems in bioinf ormatics . If 
the bioinformatics community is to share ontologies effectively, 
ontologies must be exchanged in a form that uses standardized syntax and 
semantics. This paper reports on an effort among the authors to evaluate 
alt ernative ontology-exchange languages, and to recommend one or more 
languages for use within the larger bioinformatics community. The study 
selected a set of candidate languages, and defined a set of capabilities 
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that the ideal ontology-exchange language should satisfy. The study 
scored the languages according to the degree to which they satisfied each 
capability. In addition^ the authors performed several ontology-exchange 
experiments with the two languages that received the highest scores: OML 
and Ontolingua. The result of those experiments, and the main conclusion 
of this study, was that the frame-based semantic model of Ontolingua is 
preferable to the conceptual graph model of OML, but that the XML~based 
syntax of OML is preferable to the Lisp-based syntax of Ontolingua. 

L2 7\NSWER 5 OF 18 MEDLINE on STN 

Full Text 

AN 2000139296 MEDLINE 
DN PubMed ID: 10681126 

TI Web alert. Cell differentiation cell multiplication. 

AU Pines J; Toldo L; Lafont F 

CS Welcome/CRC Institute, Cambridge, UK. 

SO Current opinion in cell biology, (1999 Dec) 11 (6) 651-2. 

Journal code: 8913428. ISSN: 0955-0674. 
CY United States 
DT (DIRECTORY) 
LA English 
FS Priority Journals 
EM 200002 

ED Entered STN: 20000229 

Last Updated on STN: 20000229 
Entered Medline: 20000215 

L2 ANSWER 6 OF 18 MEDLINE on STN 

Full Text 

AN 2000066494 MEDLINE 
DN PubMed ID: 10610095 

TI Cell-to-cell contact and extracellular matrix. Web Alert. 
AU Pines J; Toldo L; Lafont F 

CS Wellcome/CRC Institute, Campbridge, UK. JP103Qmole . bio . cam, ac.uk 
SO Current opinion in cell biology, (1999 Oct) 11 (5) 535-6. 

Journal code: 8913428. ISSN: 0955-0674. 
CY United States 
DT (DIRECTORY) 
LA English 
FS Priority Journals 
EM 199912 

ED Entered STN: 20000113 

Last Updated on STN: 20000124 
Entered Medline: 19991216 

L2 ANSWER 7 OF 18 MEDLINE on STN 

Full Text 

AN 1999347318 MEDLINE 

DN PubMed ID: 10428544 

TI Nucleus and gene expression. Web alert. 

AU Pines J; Toldo L; Lafont F 

CS Wellcome/CRC Institute, Cambridge, UK. . JP1Q3 @mole . bio . cam. ac, uk 

SO Current opinion in cell biology, (1999 Jun) 11 (3) 301. 
Journal code: 8913428. ISSN: 0955-0674. 

CY United States 

DT (DIRECTORY) 

LA English 

FS Priority Journals 

EM 199907 

ED Entered STN: 19990806 



STN Columbus 



Last Updated on STN: 19990806 
Entered Medline: 19990723 
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AN 1998386329 MEDLINE 
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DN PubMed ID: 9719860 

TI Membrane permeability. Membranes and sorting. 
AU Pines J; Toldo L; Lafont F 

CS Wellcome/CRC Institute, Cambridge.. JPlQSQmole . cam. ac . uk 
SO Current opinion in cell biology, (1998 Aug) 10 (4) 427-8. 

Journal code: 8913428. ISSN: 0955-0674. 
CY United States 
DT (DIRECTORY) 
LA English 
FS Priority Journals 
EM 199812 

ED Entered STN: 19990115 

Last Updated on STN: 19990115 
Entered Medline: 19981222 

L2 ANSWER 12 OF 18 MEDLINE on STN 

Full Text 

AN 1998222651 MEDLINE 
DN PubMed ID: 9561838 
TI Cell regulation web alert. 
AU Toldo L; Lafont F 

CS MERCK KGaA, Bio- and Chemoinf ormatics , Darmstadt, Germany.. 

luca . toldoQmerck . de 
SO Current opinion in cell biology, (1998 Apr) 10 (2) 155. 

Journal code: 8913428. ISSN: 0955-0674. 
CY United States 
DT (DIRECTORY) 
LA English 
FS Priority Journals 
EM 199806 

ED Entered STN: 19980625 

Last Updated on STN: 20000303 
Entered Medline: 19980618 

L2 ANSWER 13 OF 18 MEDLINE on STN DUPLICATE 2 

Full Text 

AN 97429414 MEDLINE 
DN PubMed ID: 9283764 

TI JaMBW 1,1: Java-based Molecular Biologists' Workbench. 
AU Toldo L I 

CS MERCK KGaA, Darmstadt, Germany., luca . toldo@merck . de 
SO Computer applications in the biosciences : CABIOS, (1997 Aug) 13 ( 
475-6. 

Journal code: 8511758, ISSN: 0266-7061. 
CY ENGLAND: United Kingdom 
DT Journal; Article; (JOURNAL ARTICLE) 
LA English 
FS Priority Journals 
EM 199710 

ED Entered STN: 19971105 

Last Updated on STN: 19971105 
Entered Medline: 19971023 

L2 ANSWER 14 OF 18 MEDLINE on STN 

Full Text 

AN 97303153 MEDLINE 

DN PubMed ID: 9159 08 7 

TI Web alert. Nucleus and gene expression. 

AU Pines J; Toldo L; Lafont F 

CS Wellcome/CRC Institute, Tennis Court Road, Cambridge, CB2 IQR, UK. 
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J P103@mole , bio « cam. ac.uk 
SO Current opinion in cell biology, (1997 Jun) 9 (3) 431. 

Journal code: 8913428. ISSN: 0955-0674. 
CY United States 
DT (DIRECTORY) 
LA English 
FS Priority Journals 
EM 199708 

ED Entered STN: 19970902 

Last Updated on STN: 20000303 
Entered Medline: 19970818 

L2 ANSWER 15 OF 18 MEDLINE on STN 

Full Text 

AN 97224252 MEDLINE 

DN PubMed ID: 9069269 

TI Web alert. Cell regulation. 

AU Pines J; Toldo L; Lafont F 

CS Wellcome/CRC Institute, Tennis Court Road, Cambridge, CB2 IQR, UK. . 

JP103@mole . bio . cam. ac.uk 
SO Current opinion in cell biology, (1997 Apr) 9 (2) 253-4. 

Journal code: 8913428. ISSN: 0955-0674. 
CY United States 
DT (DIRECTORY) 
LA English 
FS Priority Journals 
EM 199704 

ED Entered STN: 19970506 

Last Updated on STN: 20000303 
Entered Medline: 19970423 

L2 ANSWER 16 OF 18 MEDLINE on STN 

Full Text 

AN 97186505 MEDLINE 
DN PubMed ID: 9035697 
TI Web alert. Cytoskeleton . 
AU Lafont F; Toldo L 

CS Cell Biology Programme, European Molecular Biology Laboratory, 

Meyerhofstrasse 1, Heidelberg 69012, Germany.. lafQnt@EMBL-heidelberg.de 

SO Current opinion in cell biology, (1997 Feb) 9 (1) 118. 
Journal code: 8913428. ISSN: 0955-0674. 

CY United States 

DT (DIRECTORY) 

LA English 

FS Priority Journals 

EM 199702 

ED Entered STN: 19970306 

Last Updated on STN: 20000303 
Entered Medline: 19970227 

L2 ANSWER 17 OF 18 MEDLINE on STN 

Full Text 

AN 97186504 MEDLINE 
DN PubMed ID: 9035696 

TI The cell biologist and the World Wide Web. World Wide Web sites. 
AU Lafont F; Toldo L 

CS Cell Biology Programme, European Molecular Biology Laboratory, 

Meyerhofstrasse 1, Heidelberg 69012, Germany., lafont (gEMBL-heidelberg . de 

SO Current opinion in cell biology, (1997 Feb) 9 (1) 116-7. Ref: 2 
Journal code: 8913428. ISSN: 0955-0674. 
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CY United States 

DT Journal; Article; (JOURNAL ARTICLE) 

General Review; (REVIEW) 

(REVIEW, TUTORIAL) 
LA English 
FS Priority Journals 
EM 199702 

ED Entered STN: 19970306 

Last Updated on STN: 19970306 
Entered Medline: 19970227 



L2 ANSWER 18 OF 18 MEDLINE on STN DUPLICATE 3 

Full Text 

AN 89275506 MEDLINE 
DN PubMed ID: 2659222 

TI Somatotropin as measured by a two-site time-resolved immunof luorometric 
assay. 

CM Comment in: Clin Chem. 1990 Feb; 36 (2 ): 4 02 . PubMed ID: 2302801 
AU Strasburger C; Barnard G; Toldo L; Zarmi B; Zadik Z; Kowarski A; Kohen F 
CS Department of Hormone Research, Weizmann Institute of Science, Rehovot, 
Israel . 

SO Clinical chemistry, (1989 Jun) 35 (6) 913-7. 

Journal code: 9421549. ISSN: 0009-9147. 
CY United States 

DT Journal; Article; (JOURNAL ARTICLE) 

LA English 

FS Priority Journals 

EM 198907 

ED Entered STN: 19900309 

Last Updated on STN: 19900309 
Entered Medline: 19890721 

AB To date, many of the current criteria for diagnosis of somatotropin 

(growth hormone, GH) deficiency have been based upon measurement of this 
hormone by competitive radioimmunoassay (RIA) with use of polyclonal 
antibodies. In recent years, however, the development of hybridoma 
technology has led to the generation of various monoclonal antibodies 
(Mabs) to GH with different affinities and epitope specificities. 
Subsequently, these reagents have been used in the development of 
noncompetitive two-site immunometric assays (e.g., immunoradiometric 
assay; IRMA) . In general, the values obtained for serum GH by IRMA have 
been lower than those obtained by RIA, because of the epitope-specif icity 
profile of the Mabs in the IRMA. Attempting to obtain GH values 
numerically similar to those by RIA, we used a combination of Mabs to GH 
in developing and evaluating a two-site time-resolved immunof luorometric 
assay (IFMA) based on the streptavidin-biotin interaction. Fluorescence 
is proportional to concentration of analyte and is linearly related to 
concentration over the range 0.3 to 40 micrograms/L . The assay was 
satisfactory with respect to sensitivity, accuracy, and precision (CV less 
than 10% over the entire working range) . In addition, the concentration 
of GH was determined by the IFMA and a competitive RIA in serum obtained 
from GH deficient and acromegalic patients. The pairing of antibodies in 
the IFMA gave numerical values that agreed well with those by RIA (r = 
0. 97; n = 100) . 
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=> s el5-16 

32 ("RIPPMANN F"/AU OR "RIPPMANN FRIEDRICH"/AU) 
=> duplicate remove 13 

DUPLICATE PREFERENCE IS 'MEDLINE, BIOSIS' 

KEEP DUPLICATES FROM MORE THAN ONE FILE? Y/ (N) :n 

PROCESSING COMPLETED FOR L3 

L4 25 DUPLICATE REMOVE L3 (7 DUPLICATES REMOVED) 

=> d 1-25 bib ab 



L4 ANSWER 1 OF 25 BIOSIS COPYRIGHT 2004 BIOLOGICAL ABSTRACTS INC. on STN 
Full Text 

AN 2003:266107 BIOSIS 

DN PREV2 003002 66107 

TI Bicyclic amino acids. 

AU Diefenbach, Beate [Inventor, Reprint Author] ; Goodman, Simon L. 

[Inventor]; Marz, Joachim [Inventor]; Raddatz, Peter [Inventor]; 

Rippmann, Friedrich [Inventor]; Wiesner, Matthias [Inventor] 
CS Darmstadt, Germany 

ASSIGNEE: Merck Patent Gesellschaft Mit, Darmstadt, Germany 
PI US 6559144 May 06, 2003 

SO Official Gazette of the United States Patent and Trademark Office Patents, 
(May 6 2003) Vol. 1270, No. 1. http : //www. uspto . gov/ web/menu/p a tdata . html . 

e-file. ' 

ISSN: 0098-1133 (ISSN print) . 

DT Patent 

LA English 

ED Entered STN: 4 Jun 2003 

Last Updated on STN: 4 Jun 2003 

AB Compounds of the formula I ##STR1## in which X, Y, Z, Rl, R2, R3, R4, R5, 
R7, R8, Rll, m and n have the meanings stated in claim 1, and their 
physiologically acceptable salts can be used as integrin inhibitors, in 
particular for the prophylaxis and treatment of circulatory disorders, for 
thrombosis, myocardial infarct, coronary heart disease, arteriosclerosis, 
osteoporosis, for pathological processes maintained or propagated by 
angiogenesis, and in tumour therapy. 
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Divalent cations and the relationship between alphaA and betaA domains in 
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AB Integrins contain either one or two von Willebrand factor A-like domains, 
which are primary ligand and cation binding regions in the molecules. 
Here we examine the first structure of an A domain of a beta subunit, in 
alphanubetaS and compare it to known A domain structures of alpha 
subunits. Ligand binding to immobilized alphanubetaS domain is stimulated 
by Ca2+ rather than inhibited by it. Biochemical, cell biological and 
structural evidence suggests that the A domain is a major site of ligand 
interaction in alphanubetaS. The Arg-Gly-Asp based inhibitor cilengitide 
(EMD 121974) inhibites ligand interaction with transmembrane-truncated 
alphanubetaS in the presence of either Ca2+ or Mn2+ ions, and does so with 
similar kinetics. The alphanubetaS structure reveals that both the alphaA 
and betaA domains share common structural cores. But, in contrast to 
alphaA, the betaA domain has three cation binding sites, that are involved 
either directly or indirectly in ligand binding. Structural alignment of 
alphaA and betaA domains reveals additional loops unique only to the betaA 
domain and much evidence support that that these loops are important for 
ligand binding specificity and for the interaction between alpha and beta 
subunits. Since the position of these loops are evolutionary conserved 
but their primary sequence varies between the various betaA domains, they 
represents potential targets for dissecting functional diversity among 
integrins . 
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AB The invention relates to novel cyclopeptides of the formula I 

cyclo-(Arg-B-Asp-D-E) I in which B, D, and E have the meanings defined 
herein, and their salts. These compounds act as interin inhibitors and 
can be used, in particular, for the prophylaxis and treatment of disorders 
of the circulation and in tumor therapy. 
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AB LIGSITE is a new program for the automatic and time-efficient detection of 
pockets on the surface of proteins that may act as binding sites for small 
molecule ligands . Pockets are identified with a series of simple 
operations on a cubic grid. Using a set of receptor-ligand complexes we 
show that LIGSITE is able to identify the binding sites of small molecule 
ligands with high precision. The main advantage of LIGSITE is its speed. 
Typical search times are in the range of 5 to 20 s for medium-sized 
proteins. LIGSITE is therefore well suited for identification of pockets 
in large sets of proteins (e.g., protein families) for comparative 
studies. For graphical display LIGSITE produces VRML representations of 
the protein-ligand complex and the binding site for display with a VRML 
viewer such as WebSpace from SGI. 
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DT Article 
LA English 

ED Entered STN: 10 Dec 1996 

Last Updated on STN: 10 Dec 1996 
AB A new class of antithrombotic RGD-mimetics with a novel 

oxazolidinonemethyl scaffold was synthesized. High oral activity and 

bioavailability was found in this series of compounds. 
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TI An active-site mutation in the human immunodeficiency virus type 1 
proteinase (PR) causes reduced PR activity and loss of PR-mediated 
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AB Infectious retrovirus particles are derived from structural polyproteins 
which are cleaved by the viral proteinase (PR) during virion 
morphogenesis. Besides cleaving viral polyproteins, which is essential 
for infectivity, PR of human immunodeficiency virus (HIV) also cleaves 
cellular proteins and PR expression causes a pronounced cytotoxic effect. 
Retroviral PRs are aspartic proteases and contain two copies of the 
triplet Asp-Thr-Gly in the active center with the threonine adjacent to 
the catalytic aspartic acid presumed to have an important structural role. 
We have changed this threonine in HIV type 1 PR to a serine. The purified 
mutant enzyme had an approximately 5- to 10-fold lower activity against 
HIV type 1 polyprotein and peptide substrates compared with the wild-type 
enzyme. It did not induce toxicity on bacterial expression and yielded 
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significantly reduced cleavage of cytoskeletal proteins in vitro. 
Cleavage of vimentin in mutant-infected T-cell lines was also markedly 
reduced. Mutant virus did, however, elicit productive infection of 
several T-cell lines and of primary human lymphocytes with no significant 
difference in polyprotein cleavage and with similar infection kinetics and 
titer compared with wild-type virus. The discrepancy between reduced 
processing in vitro and normal virion maturation can be explained by the 
observation that reduced activity was due to an increase in Km which may 
not be relevant at the high substrate concentration in the virus particle. 
This mutation enables us therefore to dissociate the essential function of 
PR in viral maturation from its cytotoxic effect. 
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AB RGD-peptidomimetics are currently being investigated as a class of 

potential antithrombotics that antagonize the fibrinogen receptor, GP 
Ilb/IIIa, on the surface of platelets. These mimetics are expected to 
have decisive advantages - such as higher activity and specificity, oral 
bioavailability and longer duration of action - over known 
antithrombotics. For further optimization in this respect, novel 
peptidomimetic GP Ilb/lIIa antagonists with an oxazolidinonemethyl central 
building block were synthesized. This building block proved to be very 
versatile as an 'anchor' for structurally different C-termini and was the 
starting point for highly efficient and orally active compounds. 
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AB A series of novel renin inhibitors containing 2-(((3- 

phenylpropyl)phosphoryl) oxy) alkanoic acid moieties as P2--P3 surrogates are 
presented. The P2-P3 mimetics were obtained from (omega-phenylalkyl) - 
phosphinic acids la-c and 2-hydroxyalkanoic acid benzyl esters 2a-f by 
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N, N ' -dicyclohexylcarbodiimide-mediated coupling and subsequent oxidation 
with sodium metaper j odate . Ester cleavage of these derivatives and 
coupling with Pl-Pl » transition-state mimetics I~VII provided highly 
selective compounds with inhibitory potencies in the lower nanomolar 
range. Small renin inhibitors, such as analogues 8c and 8h with molecular 
weights of 539 and 537, respectively, could be prepared. These compounds 
exhibited IC50 values of about 2 0 nM against human plasma renin. Compound 
7i was examined in vivo for its hypotensive effect. In salt-depleted 
cynomolgus monkeys, 7i inhibited plasma renin activity almost completely 
and lowered blood pressure after oral administration of a dose of 30 
mg/kg. 
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TI Renin inhibitors containing new Pl-Pl' dipeptide mimetics with 
heterocycles in PI'. 
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AB A series of renin inhibitors containing new Pl-Pl' dipeptide mimetics are 
presented. The Pl-Pl' mimetics were obtained from ( 4S, 5S) -3- ( tert- 
butoxycarbonyl) -4- (cyclohexylmethyl) -5- [ (omega- mesyloxy) alkyl] -2, 2- 
dimethyloxazolidines 5b, 9, and lib by nucleophilic substitution of the 
mesylate groups with the sodium salts of mercapto- and 
hydroxyheterocycles . Removal of the protecting groups and stepwise 
acylations with amino acid derivatives provided renin inhibitors with a 
length of a tripeptide. Replacement of P2 histidine by other amino acids 
maintained or enhanced renin inhibitory potency. By alteration of P3 
phenylalanine, compounds with IC50 values in the nanomolar range and 
stability against chymotrypsin were obtained. Finally, the effect of the 
C-terminal heterocycle on the renin inhibition was studied. Compound XVII 
was examined in vivo for its hypotensive effects. In salt-depleted 
cynomolgus monkeys, XVII inhibited plasma renin activity and lowered blood 
pressure after oral administration of a dose of 10 mg/kg. 
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TI A hypothetical model for the peptide binding domain of hsp70 based on the 

peptide binding domain of HLA. 
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AB The sequences of the peptide binding domains of 33 70 kd heat shock 

proteins (hsp70) have been aligned and a consensus secondary structure has 
been deduced. Individual members showed no significant deviation from the 
consensus, which showed a beta 4 alpha motif repeated twice, followed by 
two further helices and a terminus rich in Pro and Gly. The repeated 
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motif could be aligned with the secondary structure of the functionally 
equivalent peptide binding domain of human leucocyte antigen (HLA) class I 
maintaining equivalent residues in structurally important positions in the 
two families and a model was built based on this alignment. The 
interaction of this domain with the ATP domain is considered. The overall 
model is shown to be consistent with the properties of products of 
chymotryptic cleavage. 
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TI Biological assays for irritant, tumor-initiating and -promoting 

activities. III. Computer-assisted management and validation of biodata 
generated by standardized initiation/promotion protocols in skin of mice. 
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AB The initiation/promotion standard protocol 28 (protocol 28), developed and 
used previously as an experimental model to verify the cancerogenic 
process of initiation/promotion in mouse skin, was revised in three 
aspects: (a) statistically it was shown sufficient to use, per promoter 
dose group, 16 colonyoutbred female NMRI mice: (b) by weekly individual 
records of tumor response (and health status) of each mouse in a dose 
group, cumulative tumor incidences (and mean and extreme body weights) are 
determined; from these data the collective records (tumor response, health 
status), the only data accessible from protocol 28, may be generated in 
addition; (c) the details of dose groups and all data on tumor response 
and health status are processed by computer using the program package 
PAPILLOM. The latter was developed specifically for this purpose, is 
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written in the programming language APL and designed for easy handling by 
staff of animal laboratories. The program package calculates, from the 
individual records per promoter dose group, cumulative tumor incidences 
(and survival data) with confidence limits for any one exposure time, and 
the package may be linked to programs for statistical validations. In 
addition, from the collective records it calculates the tumor rates, tumor 
yields and survival rates for any one exposure time. These data, obtained 
by either of the standard protocols (16 or 28), are fully comparable. For 
pure compounds they may be used to calculate semiquantitative 
tumor-promoting potencies. These values for more than 80 polyf unctional 
diterpenes of the tigliane, ingenane and daphnane type, scattered in or 
calculated from previous papers, together with their irritancies, were 
compiled. Within recent years, computer-assisted standard protocol 16 has 
been used to handle and evaluate about 1000 promoter dose groups. 
Protocol 16 allows one to extract and utilize more and better 
toxicological information on tumor response and health status from any one 
dose group, utilizing significantly fewer experimental animals than 
required by protocol 28. Thus, the computer-assisted standard protocol 16 
optimizes the utility of the experimental model of mouse skin for the 
amount, quality and management of experimental data as well as for the 
requirements of animal protection. 
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TI Visualization of structural similarity in proteins. 
AU Rippmann F; Taylor W R 
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AB Two new methods for the visualization of structural similarity in proteins 
with known three-dimensional structures are presented. They are based on 
the degree of equivalence of alpha-carbon pairs in two proteins. The 
quantitative measure for residue equivalence is the comparison score 
generated using the sequence and structure alignment method of Taylor and 
Orengo, which is based on the comparison of interatomic distances (and 
other properties that can be defined on a residue basis) . The first 
method uses information on corresponding alpha-carbon positions to display 
vectors joining these structurally equivalent residues. These vectors can 
be defined as target constraints, and their minimization "bends" the two 
proteins toward a common average structure. In the average structure the 
corresponding residues virtually superpose, while insertions and deletions 
become clearly visible. The second method uses the comparison scores to 
perform a weighted least-squares fit of the two structures. It is further 
used to color code the two structures according to the score value, i.e., 
their similarity, on a continuous scale from red to blue. Examples of the 
methods for the comparison of flavodoxin, chemotaxis Y protein and 
L-arabinose-binding protein are given. 
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AB In a previous article, a multistage model of carcinogenesis was introduced 
that takes into account the role of DNA damage, DNA repair, and cell 
replication on the incidence of malignancies. For this model the number 
of detectable clones of initiated cells is derived and model parameters 
are estimated using data arising from a two-stage skin-painting experiment 
in mice. The data from this experiment are interpretable in terms of the 
cellular events involved in initiation and promotion. 
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Last Updated on STN: 24 Jul 1990 

AB Phorbol esters are polyf unctional agents which influence the carcinogenic 
process via a receptor mechanism. Two structural elements of phorbol 
esters seem to be responsible for tumor promoting activity. One element 
is formed of certain hyrophilic groups which are supposed to be 
responsible for the specific receptor binding. The other element consists 
of hydrophobic groups which should rather unspecif ically be responsible 
for partition and transport between biological phases. In order to 
differentiate between these two structural elements, the numerical 
relative tumor promoting activity of phorbol diesters (a new activity in 
QSAR) was calculated from rodent skin painting experiments and its 
dependence from hydrophobicity was modeled using the parabolic and the 
bilinear model. The results show that the tumor promoting activity of 
aliphatic phorbol 12 , 13-diesters can adequately be described by 
hydrophobicity only. This indicates that the structural element 
' aliphatic-12, 13-diesters is rather unspecif ically responsible for 
partition and transport between biological phases. A medium 
hydrophobicity caused by aliphatic ester chains in position 12 and 13 of 
phorbol is a sufficient condition for high promoting activity as long as 
the other hydrophilic groups are not removed, acylated or otherwise 
changed. 
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TI The structure- function relationship in the clostripain family of 
peptidases . 
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Last Updated on STN: 20040408 
Entered Medline: 20040407 

In this study we investigate the active-site structure and the catalytic 
mechanism of clostripain by using a combination of three separate 
techniques: affinity labelling, site-directed mutagenesis and molecular 
modelling. A benzamidinyl-diazo dichlorotriazine dye (BDD) was shown to 
act as an efficient active site-directed affinity label for Clostridium 
histolyticum clostripain. The enzyme, upon incubation with BDD in 0 . 1 m 
Hepes/NaOH buffer pH 7.6, exhibits a time-dependent loss of activity. The 
rate of inactivation exhibits a nonlinear dependence on the BDD 
concentration, which can be described by reversible binding of dye to the 
enzyme prior to the irreversible reaction. The dissociation constant of 
the reversible formation of an enzyme-BDD complex is KD = 74.6 +/- 2.1 
micro m and the maximal rate constant of inactivation is k3 = 0.21 x 
min(-l). Effective protection against inactivation by BDD is provided by 
the substrate N-benzoyl-L-arginine ethyl ester (BAEE) . Cleavage of 
BDD-modified enzyme with trypsin and subsequent separation of peptides 
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by reverse-phase HPLC gave only one modified peptide. Amino acid 
sequencing of the modified tryptic peptide revealed the target site of BDD 
reaction to be Hisl76. Site-directed mutagenesis was used to study 
further the functional role of Hisl76. The mutant Hisl76Ala enzyme 
exhibited zero activity against BAEE. Together with previous data, these 
results confirm that a catalytic dyad of Hisl76 and Cys231 is responsible 
for cysteine peptidase activity in the Cll peptidase family. A molecular 
model of the catalytic domain of clostripain was constructed using a 
manually extended fold recognition-derived alignment with caspases . A 
rigorous iterative modelling scheme resulted in an objectively sound 
model which points to Asp229 as responsible for defining the strong 
substrate specificity for Arg at the PI position. Two possible binding 
sites for the calcium required for auto-activation could be located. 
Database searches show that clostripain homologues are not confined to 
bacterial lineages and reveal an intriguing variety of domain 
architectures . 
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TI In-depth analysis of the thylakoid membrane proteome of Arabidopsis 

thaliana chloroplasts : new proteins, new functions, and a plastid proteome 
database . 

AU Friso Giulia; Giacomelli Lisa; Ytterberg A Jimmy; Peltier Jean-Benoit; 

Rudella Andrea; Sun Qi; Wijk Klaas J van 
CS Department of Plant Biology, Cornell University, Ithaca, New York 14853, 

USA. 

SO Plant cell, (2004 Feb) 16 (2) 478-99. 

Journal code: 9208688. ISSN: 1040-4651. 
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LA English 
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ED Entered STN: 20040210 

Last Updated on STN: 20040625 
Entered Medline: 20040623 

AB An extensive analysis of the Arabidopsis thaliana peripheral and integral 
thylakoid membrane proteome was performed by sequential extractions with 
salt, detergent, and organic solvents, followed by multidimensional 
protein separation steps (reverse-phase HPLC and one- and 
two-dimensional electrophoresis gels), different enzymatic and 
nonenzymatic protein cleavage techniques, mass spectrometry, and 
bioinformatics. Altogether, 154 proteins were identified, of which 76 
(A9%) were alpha-helical integral membrane proteins. Twenty-seven new 
proteins without known function but with predicted chloroplast transit 
peptides were identified, of which 17 (63%) are integral membrane 
proteins. These new proteins, likely important in thylakoid biogenesis, 
include two rubredoxins, a potential metallochaperone, and a new DnaJ-like 
protein. The data were integrated with our analysis of the 
lumenal-enriched proteome. We identified 83 out of 100 known proteins of 
the thylakoid localized photosynthetic apparatus, including several new 
paralogues and some 20 proteins involved in protein insertion, assembly, 
folding, or proteolysis. An additional 16 proteins are involved in 
translation, demonstrating that the thylakoid membrane surface is an 
important site for protein synthesis. The high coverage of the 
photosynthetic apparatus and the identification of known hydrophobic 
proteins with low expression levels, such as cpSecE, Ohpl, and Ohp2, 
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indicate an excellent dynamic resolution of the analysis. The 
sequential extraction process proved very helpful to validate 
transmembrane prediction. Our data also were cross-correlated to 
chloroplast subproteome analyses by other laboratories. All data are 
deposited in a new curated plastid proteome database (PPDB) with 
multiple search functions ( http : //cbsusrvQl . tc. Cornell . edu/users/ppdb/ ) , 
This PPDB will serve as an expandable resource for the plant community. 
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Full Text 
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TI Three monophyletic superf amilies account for the majority of the known 

glycosyltransf erases , 
AU Liu Jing; Mushegian Arcady 

CS Stowers Institute for Medical Research, 1000 E. 50th Street, Kansas City, 
MO 64110, USA. 

SO Protein science : a publication of the Protein Society, (2003 Jul) 12 (7) 
1418-31. 

Journal code: 9211750. ISSN: 0961-8368. 
CY United States 

DT Journal; Article; (JOURNAL ARTICLE) 
LA English 
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ED Entered STN: 20030626 

Last Updated on STN: 20031218 

AB Sixty-five families of glycosyltransf erases (EC 2.4,x.y) have been 

recognized on the basis of high-sequence similarity to a founding member 
with experimentally demonstrated enzymatic activity. Although distant 
sequence relationships between some of these families have been 
reported, the natural history of glycosyltransf erases is poorly 
understood. We used iterative searches of sequence databases, 
motif extraction, structural comparison, and analysis of completely 
sequenced genomes to track the origins of modern-type 

glycosyltransf erases . We show that >75% of recognized glycosyltransf erase 
families belong to one of only three monophyletic superf amilies of 
proteins, namely, (1) a recently described GPGTF/GT-B superfamily; (2) a 
nucleoside-diphosphosugar transferase (GT-A) superfamily, which is 
characterized by a DxD sequence signature and also includes 
nucleotidyltransferases; and (3) a GT-C superfamily of integral membrane 
glycosyltransferases with a modified DxD signature in the first 
extracellular loop. Several developmental regulators in Metazoans, 
including Fringe and Egghead homologs, belong to the second superfamily. 
Interestingly, Tout-velu/Exostosin family of developmental proteins found 
in all multicellular eukaryotes, contains separate domains belonging to 
the first and the second superf amilies , explaining multiple 
glycosyltransf erase activities in one protein. 
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Entered Medline: 20020211 

AB Guanine nucleotide-binding protein-coupled receptors (GPCRs) comprise 

large and diverse gene families in fungi, plants, and the animal kingdom. 
GPCRs appear to share a common structure with 7 transmembrane segments , 
but sequence similarity is minimal among the most distant GPCRs. To 
reevaluate the question of evolutionary relationships among the disparate 
GPCR families, this study takes advantage of the dramatically increased 
number of cloned GPCRs. Sequences were selected from the National 
Center for Biotechnology Information (NCBI) nonredundant peptide 
database using iterative BLAST (Basic Local Alignment Search Tool) 
searches to yield a database of approximately 17 00 GPCRs and unrelated 
membrane proteins as controls, divided into 34 distinct clusters. For 
each cluster, separate position-specific matrices were established to 
optimize sequence comparisons among GPCRs. This approach resulted in 
significant alignments between distant GPCR families, including receptors 
for the biogenic amine/peptide, VIP/secretin, cAMP, STE3/MAP3 fungal 
pheromones, latrophilin, developmental receptors frizzled and smoothened, 
as well as the more distant metabotrobic glutamate receptors, the 
STE2/MAM2 fungal pheromone receptors, and GPRl, a fungal glucose receptor. 
On the other hand, alignment scores between these recognized GPCR clades 
with p40 (putative GPCR) and pml (putative GPCR), as well as 
bacteriorhodopsins, failed to support a finding of homology. This study 
provides a refined view of GPCR ancestry and serves as a reference 
database with hyperlinks to other sources. Moreover, it may facilitate 
database annotation and the assignment of orphan receptors to GPCR 
families . 
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[Reprint author] 

CS European Molecular Biology Laboratory, Meyerhofstr. 1, Heidelberg, 69012, 
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SO Journal of Molecular Biology, (May 5, 2000) Vol. 298, No. 3, pp. 521-537. 
print . 
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DT Article 
LA English 

ED Entered STN: 6 Sep 2000 

Last Updated on STN: 8 Jan 2002 

AB Short protein repeats, frequently with a length between 20 and 40 

residues, represent a significant fraction of known proteins. Many 
repeats appear to possess high amino acid substitution rates and thus 
recognition of repeat homologues is highly problematic. Even if the 
presence of a certain repeat family is known, the exact locations and the 
number of repetitive units often cannot be determined using current 
methods. We have devised an iterative algorithm based on optimal and 
sub-optimal score distributions from profile analysis that estimates the 
significance of all repeats that are detected in a single sequence. 
This procedure allows the identification of homologues at alignment scores 
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lower than the highest optimal alignment score for non-homologous 
sequences. The method has been used to investigate the occurrence of 
eleven families of repeats in Saccharomyces cerevisiae, Caenorhabditis 
elegans and Homo sapiens accounting for 1055, 2205 and 2320 repeats, 
respectively. For these examples, the method is both more sensitive and 
more selective than conventional homology search procedures. The method 
allowed the detection in the SwissProt database of more than 2000 
previously unrecognised repeats belonging to the 11 families. In 
addition, the method was used to merge several repeat families that 
previously were supposed to be distinct, indicating common phylogenetic 
origins for these families. 
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AN 2003:135603 BIOSIS 
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TI Search for structural similarity in proteins. 

AU Leluk, Jacek; Konieczny, Leszek; Roterman, Irena [Reprint Author] 

CS Department of Biostatis tics and Medical Informatics, Collegium Medicum, 

Jagiellonian University, Kopernika 17, 31-501, Krakow, Poland 
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SO Bioinformatics (Oxford), (January 2003) Vol. 19, No. 1, pp. 117-124. 

print . 

ISSN: 1367-4803. 
DT Article 
LA English 

ED Entered STN: 12 Mar 2003 

Last Updated on STN: 12 Mar 2003 

AB Motivation: The expanding protein secpience and structure databases 
await methods allowing rapid similarity search. Geometric 
parameters-dihedral angle between two sequential peptide bond planes (V) 
and radius of curvature (R) as they appear in pentapeptide fragments in 
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polypeptide chains-are proposed for use in evaluating structural 
similarity in proteins (VeaR) . The parabolic (empirical) function 
expressing the radius of curvature's dependence on the V-angle in model 
polypeptides is altered in real proteins in a form characteristic for a 
particular protein. This can be used as a criterion for judging 
similarity. Results: A structural comparison of proteins representing a 
wide spectrum of structures was assessed versus sequence similarity 
analysis based on the genetic semihomology algorithm. The term 'consensus 
structure', analogous to * consensus sequence', was introduced for the 
serpine family. 
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AN 2003243846 MEDLINE 
DN PubMed ID: 12766406 

TI PACES: Protein sequential assignment by computer-assisted exhaustive 
search . 

AU Coggins Brian E; Zhou Pel 

CS Department of Biochemistry, Duke University Medical Center, Durham, NC 

27710, U.S.A. 
NC GM 51310 (NIGMS) 

SO Journal of biomolecular NMR, (2003 Jun) 26 (2) 93-111. 

Journal code: 9110829. ISSN: 0925-2738. 
CY Netherlands 
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Journal; Article; (JOURNAL ARTICLE) 
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ED Entered STN: 20030528 

Last Updated on STN: 20040305 
Entered Medline: 20040304 

AB A crucial step in determining solution structures of proteins using 
nuclear magnetic resonance (NMR) spectroscopy is the process of 
sequential assignment, which correlates backbone resonances to 
corresponding residues in the primary sequence of a protein, today, 
typically using data from triple-resonance NMR experiments. Although the 
development of automated approaches for sequential assignment has 
greatly facilitated this process, the performance of these programs is 
usually less satisfactory for large proteins, especially in the cases of 
missing connectivity or severe chemical shift degeneracy. Here, we report 
the development of a novel computer-assisted method for sequential 
assignment, using an algorithm that conducts an exhaustive search of all 
spin systems both for establishing sequential connectivities and then 
for assignment. By running the program iteratively with user 
intervention after each cycle, ambiguities in the assignments can be 
eliminated efficiently and backbone resonances can be assigned rapidly. 
The efficiency and robustness of this approach have been tested with 27 
proteins of sizes varying from 76 amino acids to 723 amino acids, and with 
data of varying qualities, using experimental data for three proteins, and 
published assignments modified with simulated noise for the other 24. The 
complexity of sequential assignment with regard to the size of the 
protein, the completeness of NMR data sets, and the uncertainty in 
resonance positions has been examined. 
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detection . 
AU Madera Martin; Gough Julian 

CS MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 2QH, UK- . 

mm238Qmrc-lmb » cam. ac.uk 
SO Nucleic acids research, (2002 Oct 1) 30 (19) 4321-8. 
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AB Profile hidden Markov models (HMMs) are amongst the most successful 

procedures for detecting remote homology between proteins. There are two 
popular profile HMM programs, HMMER and SAM. Little is known about 
their performance relative to each other and to the recently improved 
version of PSI-BLAST. Here we compare the two programs to each other 
and to non-HMM methods, to determine their relative performance and the 
features that are important for their success. The quality of the 
multiple sequence alignments used to build models was the most important 
factor affecting the overall performance of profile HMMs. The SAM T99 
procedure is needed to produce high quality alignments automatically, and 
the lack of an equivalent component in HMMER makes it less complete as a 
package. Using the default options and parameters as would be expected of 
an inexpert user, it was found that from identical alignments SAM 
consistently produces better models than HMMER and that the relative 
performance of the model-scoring components varies. On average, HMMER was 
found to be between one and three times faster than SAM when searching 
databases larger than 2000 sequences, SAM being faster on smaller 
ones. Both methods were shown to have effective low complexity and repeat 
sequence masking using their null models, and the accuracy of their 
E-values was comparable. It was found that the SAM T99 iterative 
database search procedure performs better than the most recent version 
of PSI-BLAST, but that scoring of PSI-BLAST profiles is more than 30 times 
faster than scoring of SAM models. 
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Last Updated on STN: 20021211 
Entered Medline: 20030819 
AB ADP-ribosyltransf erases including toxins secreted by Vibrio cholera, 
Pseudomonas aerurginosa, and other pathogenic bacteria inactivate the 
function of human target proteins by attaching ADP-ribose onto a critical 
amino acid residue. Cross-species polymerase chain reaction (PCR) and 
database mining identified the orthologs of these ADP-ribosylating 
toxins in humans and the mouse. The human genome contains four functional 
toxin-related ADP-ribosyltransf erase genes (ARTs) and two related 
intron-containing pseudogenes; the mouse has six functional orthologs. 
The human and mouse ART genes map to chromosomal regions with conserved 
linkage synteny. The individual ART genes reveal highly restricted 
expression patterns, which are largely conserved in humans and the mouse. 
We confirmed the predicted extracellular location of the ART proteins by 
expressing recombinant ARTs in insect cells. Two human and four mouse 
ARTS contain the active site motif (R-S-EXE) typical of arginine-specif ic 
ADP-ribosyltransf erases and exhibit the predicted enzyme activities. Two 
other human ARTs and their murine orthologues deviate in the active site 
motif and lack detectable enzyme activity. Conceivably, these ARTs may 
have acquired a new specificity or function. The position-sensitive 
iterative database search program PSI-BLAST connected the 
mammalian ARTs with most known bacterial ADP-ribosylating toxins. In 
contrast, no related open reading frames occur in the four completed 
genomes of lower eucaryotes (yeast, worm, fly, and mustard weed) . 
Interestingly, these organisms also lack genes for ADP-ribosylhydrolases , 
the enzymes that reverse protein ADP-ribosylation. This suggests that the 
two enzyme families that catalyze reversible mono-ADP-ribosylation either 
were lost from the genomes of these nonchordata eucaryotes or were subject 
to horizontal gene transfer between kingdoms. 
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AB The Conserved Domain Architecture Retrieval Tool (CDART) performs 
similarity searches of the NCBI Entrez Protein Database based on 
domain architecture, defined as the sequential order of conserved 
domains in proteins. The algorithm finds protein similarities across 
significant evolutionary distances using sensitive protein domain profiles 
rather than by direct sequence similarity. Proteins similar to a query 
protein are grouped and scored by architecture. Relying on domain 
profiles allows CDART to be fast, and, because it relies on annotated 
functional domains, informative. Domain profiles are derived from several 
collections of domain definitions that include functional annotation. 
Searches can be further refined by taxonomy and by selecting domains of 
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interest. CDART is available at http ; / /www> ncbi . nlm. nih . gov/Structure/lex 

ington/lexington . cgi . 
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AB MOTIVATION: Many sequences, and in some cases structures, of proteins 
that induce an allergic response in atopic individuals have been 
determined in recent years. This data indicates that allergens, 
regardless of source, fall into discreet protein families. Similarities 
in the sequence may explain clinically observed cross-reactivities 
between different biological triggers. However, previously available 
allergy databases group allergens according to their biological sources, 
or observed clinical cross-reactivities, without providing data about the 
proteins. A computer-aided data mining system is needed to compare the 
sequential and structural details of known allergens. This information 
will aid in predicting allergenic cross-responses and eventually in 
determining possible common characteristics of IgE recognition. RESULTS: 
The new web-based Structural Database of Allergenic Proteins (SDAP) 
permits the user to quickly compare the sequence and structure of 
allergenic proteins. Data from literature sources and previously existing 
lists of allergens are combined in a MySQL interactive database with a 
wide selection of bioinformatics applications. SDAP can be used to 
rapidly determine the relationship between allergens and to screen novel 
proteins for the presence of IgE or T-cell epitopes they may share with 
known allergens. Further, our novel similarity search method, based on 
five dimensional descriptors of amino acid properties, can be used to scan 
the SDAP entries with a peptide sequence. For example, when a known IgE 
binding epitope from shrimp tropomyosin was used as a query, the method 
rapidly identified a similar sequence in known shellfish and insect 
allergens. This prediction of cross-reactivity between allergens is 
consistent with clinical observations. AVAILABILITY: SDAP is available on 
the web at http : // f ermi . utmb . edu/SDAP/index . html 
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AB The computational detection of novel selenoproteins in genomic sequences 

is usually achieved through identification of SECIS, a conserved secondary 
structure element found in the 3' UTR of animal selenoprotein mRNAs . 
Previous studies have used "descriptors" specifying the number of base 
pairs and the conserved nucleotides in SECIS to identify this element. A 
major drawback of the "descriptor" approach is that the number of 
detections in current genomic or transcript databases largely exceeds 
the number of true selenoproteins. In this study, we use instead the 
ERPIN program to detect SECIS elements. ERPIN is based on a lod-score 
profile algorithm that uses a training-set of aligned RNA sequences as 
input. From an initial alignment of 44 animal SECIS sequences^ we 
performed a series of iterative searches in which the training set was 
progressively enriched up to 117 confirmed SECIS elements, from a large 
collection of metazoan species. About 200 high-scoring candidates were 
also detected. We show that ERPIN scores for these candidates can be 
converted into expect values, thus enabling their statistical evaluation. 
The most interesting SECIS candidates are presented. 
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AB The ribosomal differentiation of medical micro-organisms (RIDOM) web 

server, first described by Harmsen et al . [Harmsden, D . , Rothganger , J. , 
Singer, Albert, J. and Frosch, M. (1999) Lancet, 353, 291], is an 
evolving electronic resource designed to provide micro-organism 
differentiation services for medical identification needs. The diagnostic 
procedure begins with a specimen partial small subunit ribosomal DNA (16S 
rDNA) sequence. Resulting from a similarity search, a species or 
genus name for the specimen in question will be returned. Where the first 
results are ambiguous or do not define to species level, hints for further 
molecular, i.e. internal transcribed spacer, and conventional phenotypic 
differentiation will be offered (* sequential and polyphasic approach'). 
Additionally, each entry in RIDOM contains detailed medical and taxonomic 
information linked, context-sensitive, to external World Wide Web 
services. Nearly all sequences are newly determined and the sequence 
chromatograms are available for intersubjective quality control. 
Similarity searches are now also possible by direct submission of trace 
files (ABI or SCF format) . Based on the PHRED/PHRAP software, error 
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probability measures are attached to each predicted nucleotide base and 
visualised with a new 'Trace Editor'. The RIDOM web site is directly 
accessible on the World Wide Web at http : //www. ridQin,de/ . The email 
address for questions and comments is webmasterQridom, de . 
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AB PSI-BLAST is an iterative program to search a database for 

proteins with distant similarity to a query sequence. We investigated 
over a dozen modifications to the methods used in PSI-BLAST, with the goal 
of improving accuracy in finding true positive matches. To evaluate 
performance we used a set of 103 queries for which the true positives in 
yeast had been annotated by human experts, and a popular measure of 
retrieval accuracy (ROC) that can be normalized to take on values between 
0 (worst) and 1 (best) . The modifications we consider novel improve the 
ROC score from 0.758 +/- 0.005 to 0.895 +/- 0.003. This does not include 
the benefits from four modifications we included in the 'baseline' 
version, even though they were not implemented in PSI-BLAST version 2.0. 
The improvement in accuracy was confirmed on a small second test set. 
This test involved analyzing three protein families with curated lists of 
true positives from the non-redundant protein database. The 
modification that accounts for the majority of the improvement is the use, 
for each database sequence, of a position-specific scoring system 
tuned to that sequence's amino acid composition. The use of 
composition-based statistics is particularly beneficial for large-scale 
automated applications of PSI-BLAST. 
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AB We present here a new approach to the problem of defining RNA signatures 
and finding their occurrences in sequence databases. The proposed 
method is based on "secondary structure profiles". An RNA sequence 
alignment with secondary structure information is used as an input. Two 
types of weight matrices/profiles are constructed from this alignment: 
single strands are represented by a classical lod-scores profile while 
helical regions are represented by an extended "helical profile" 
comprising 16 lod-scores per position^ one for each of the 16 possible 
base-pairs. DateJsase searches are then conducted using a simultaneous 
search for helical profiles and dynamic programming alignment of single 
strand profiles. The algorithm has been implemented into a new 
software, ERPIN, that performs both profile construction and database 
search. Applications are presented for several RNA motifs. The 
automated use of sequence information in both single-stranded and 
helical regions yields better sensitivity/specificity ratios than 
descriptor-based programs. Furthermore, since the translation of 
alignments into profiles is straightforward with ERPIN, iterative 
searches can easily be conducted to enrich collections of homologous RNAs . 
Copyright 2001 Academic Press. 
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AB MOTIVATION: SAM-T99 is an iterative hidden Markov model-based method for 
finding proteins similar to a single target sequence and aligning them. 
One of its main uses is to produce multiple alignments of homologs of the 
target sequence. Previous tests of SAM-T99 and its predecessors have 
concentrated on the quality of the searches performed, not on the 
quality of the multiple alignment. In this paper we report on tests of 
multiple alignment quality, comparing SAM-T99 to the standard multiple 
aligner, CLUSTALW. RESULTS: The paper evaluates the multiple-alignment 
aspect of the SAM-T99 protocol, using the BAliBASE benchmark alignment 
database. On these benchmarks, SAM-T99 is comparable in accuracy with 
ClustalW. AVAILABILITY: The SAM-T99 protocol can be run on the web at 
http: / /www. cse.ucsc.edu/ research/ compbio/HMM-apps/T99-query . html and the 
alignment tune-up option described here can be run at 
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http : //www, cse,ucsG. edu/research/cQnipbio/HMM-apps/T99-tuneup . html . The 
protocol is also part of the standard SAM suite of tools. 
http : //www. cse . ucsc.edu/ research/ compbio / sam/ 

LIO ANSWER 12 OF 29 MEDLINE on STN DUPLICATE 8 

Full Text 

AN 2001610257 MEDLINE 
DN PubMed ID: 11684083 

TI In silico analysis of the tRNA:mlA58 methyltrans f erase family: 

homology-based fold prediction and identification of new members from 
Eubacteria and Archaea. 

AU Bujnicki J M 

CS Bioinf ormatics Laboratory, International Institute of Molecular and Cell 
Biology^ ul. ks . Trojdena 4, 02-109 Warsaw, Poland., info@bioinfo.pl 

SO FEES letters, (2001 Oct 26) 507 (2) 123-7. 
Journal code: 0155157. ISSN: 0014-5793. 

CY Netherlands 

DT Journal; Article; (JOURNAL ARTICLE) 

LA English 

FS Priority Journals 

EM 200112 

ED Entered STN: 20011102 

Last Updated on STN: 20020123 
Entered Medline: 20011211 

AB The amino acid sequences of GcdlOp and Gcdl4p, the two subunits of the 
tRNA: (l-methyladenosine-58; m(l)A58) methyltrans f erase (MTase) of 
Saccharomyces cerevisiae, have been analyzed using iterative sequence 
database searches and fold recognition programs. The results 
suggest that the 'catalytic' Gcdl4p and 'substrate binding' GcdlOp are 
related to each other and to a group of prokaryotic open reading frames, 
which were previously annotated as hypothetical protein isoaspartate 
MTases in sequence databases. It is predicted that the prokaryotic 
proteins are genuine tRNA:m(l)A MTases based on similarity of their 
predicted active site to the Gcdl4p family. In addition to the MTase 
domain, an additional domain was identified in the N~terminus of all these 
proteins that may be involved in interaction with tRNA. These results 
suggest that the eukaryotic tRNA:m(l)A58 MTase is a product of gene 
duplication and divergent evolution of a possibly homodimeric prokaryotic 
enzyme . 
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Entered Medline: 20010802 
AB Multiple alignment, since its introduction in the early seventies, has 
become a cornerstone of modern molecular biology. It has traditionally 
been used to deduce structure / function by homology, to detect conserved 
motifs and in phylogenetic studies. There has recently been some renewed 
interest in the development of multiple alignment techniques, with current 
opinion moving away from a single all-encompassing algorithm to 
iterative and / or co-operative strategies. The exploitation of 
multiple alignments in genome annotation projects represents a qualitative 
leap in the functional analysis process, opening the way to the study of 
the co-evolution of validated sets of proteins and to reliable 
phylogenomic analysis. However, the alignment of the highly complex 
proteins detected by today's advanced database search methods is a 
daunting task. In addition, with the explosion of the sequence 
databases and with the establishment of numerous specialized biological 
databases, multiple alignment programs must evolve if they are to 
successfully rise to the new challenges of the post-genomic era. The way 
forward is clearly an integrated system bringing together sequence data, 
knowledge-based systems and prediction methods with their inherent 
unreliability. The incorporation of such heterogeneous, often 
non-consistent, data will require major changes to the fundamental 
alignment algorithms used to date. Such an integrated multiple alignment 
system will provide an ideal workbench for the validation, propagation and 
presentation of this information in a format that is concise, clear and 
intuitive . 
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AB The PSI-BLAST algorithm has been acknowledged as one of the most powerful 
tools for detecting remote evolutionary relationships by sequence 
considerations only. This has been demonstrated by its ability to 
recognize remote structural homologues and by the greatest coverage it 
enables in annotation of a complete genome. Although recognizing the 
correct fold of a sequence is of major importance, the accuracy of the 
alignment is crucial for the success of modeling one sequence by the 
structure of its remote homologue. Here we assess the accuracy of 
PSI-BLAST alignments on a stringent database of 123 structurally 
similar, sequence- dissimilar pairs of proteins, by comparing them to the 
alignments defined on a structural basis. Each protein sequence is 
compared to a nonredundant database of the protein sequences by 
PSI-BLAST. Whenever a pair member detects its pair-mate, the positions 
that are aligned both in the sequential and structural alignments are 
determined, and the alignment sensitivity is expressed as the percentage 
of these positions out of the structural alignment. Fifty-two sequences 
detected their pair-mates (for 16 pairs the success was bi-directional 
when either pair member was used as a query) . The average percentage of 
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correctly aligned residues per structural alignment was 43.5 +- 2.2%. 
Other properties of the alignments were also examined^ such as the 
sensitivity vs. specificity and the change in these parameters over 
consecutive iterations. Notably, there is an improvement in alignment 
sensitivity over consecutive iterations, reaching an average of 50.9 +- 
2.5% within the five iterations tested in the current study. 
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AB A number of well-known bacterial toxins ADP-ribosylate and thereby 

inactivate target proteins in their animal hosts. Recently, several 
vertebrate ecto-enzymes (ART1-ART7) with activities similar to bacterial 
toxins have also been cloned. We show here that PSIBLAST, a 
position-specific-iterative database search program, faithfully 
connects all known vertebrate ecto-mono (ADP-ribosyl ) trans f erases (mADPRTs) 
with most of the known bacterial mADPRTs. Intriguingly, no matches were 
found in the available public genome sequences of archaeabacteria, the 
yeast Saccharomyces cerevisiae or the nematode Caenorhabditis elegans . 
Significant new matches detected by PSIBLAST from the public sequence 
data bases included only one open reading frame (ORF) of previously 
unknown function: the spvB gene contained in the virulence plasmids of 
Salmonella enterica. Structure predictions of SpvB indicated that it is 
composed of a C-terminal ADP-ribosyltransf erase domain fused via a poly 
proline stretch to a N-domain resembling the N-domain of the secretory 
toxin TcaC from nematode-inf ecting enterobacteria . We produced the 
predicted catalytic domain of SpvB as a recombinant fusion protein and 
demonstrate that it, indeed, acts as an ADP-ribosyltransf erase . Our 
findings underscore the power of the PSIBLAST program for the discovery 
of new family members in genome databases. Moreover, they open a new 
avenue of investigation regarding salmonella pathogenesis. 
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AB MOTIVATION: Sequence alignment techniques have been developed into 

extremely powerful tools for identifying the folding families and function 
of proteins in newly sequenced genomes. For a sufficiently low sequence 
identity it is necessary to incorporate additional structural information 
to positively detect homologous proteins. We have carried out an 
extensive analysis of the effectiveness of incorporating secondary 
structure information directly into the alignments for fold recognition 
and identification of distant protein homologs . A secondary structure 
similarity matrix based on a database of three-dimensionally aligned 
proteins was first constructed. An iterative application of dynamic 
programming was used which incorporates linear combinations of amino acid 
and secondary structure sequence similarity scores. Initially, only 
primary sequence information is used. Subsequently contributions from 
secondary structure are phased in and new homologous proteins are 
positively identified if their scores are consistent with the 
predetermined error rate. RESULTS: We used the SCOP40 database, where 
only PDB sequences that have 40% homology or less are included, to 
calibrate homology detection by the combined amino acid and secondary 
structure sequence alignments. Combining predicted secondary structure 
with sequence information results in a 8-15% increase in homology 
detection within SCOP40 relative to the pairwise alignments using only 
amino acid sequence data at an error rate of 0.01 errors per query; a 
35% increase is observed when the actual secondary structure sequences 
are used. Incorporating predicted secondary structure information in the 
analysis of six small genomes yields an improvement in the homology 
detection of approximately 20% over SSEARCH pairwise alignments, but no 
improvement in the total number of homologs detected over PSI-BLAST, at an 
error rate of 0.01 errors per query. However, because the pairwise 
alignments based on combinations of amino acid and secondary structure 
similarity are different from those produced by PSI-BLAST and the error 
rates can be calibrated, it is possible to combine the results of both 
searches. An additional 25% relative improvement in the number of genes 
identified at an error rate of 0.01 is observed when the data is pooled in 
this way. Similarly for the SCOP40 dataset, PSI-BLAST detected 15% of all 
possible homologs, whereas the pooled results increased the total number 
of homologs detected to 19%. These results are compared with recent 
reports of homology detection using sequence profiling methods. 
AVAILABILITY: Secondary structure alignment homepage at 
http : //lutece . rutgers . edu/ssas CONTACT: anders Qrutchem. rutgers . edu ; 
ronlevyQlutece . rutgers . edu Supplementary Information: Genome 
sequence/ structure alignment results at http : //lutece . rutgers . edu/ss f ol 
d predictions . 
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AB Motivation: Sensitive detection and masking of low- complexity regions in 
protein sequences. Filtered sequences can be used in sequence 
comparison without the risk of matching compositionally biased regions. 
The main advantage of the method over similar approaches is the selective 
masking of single residue types without affecting other, possibly 
important, regions. Results: A novel algorithm for low-complexity region 
detection and selective masking. The algorithm is based on multiple-pass 
Smith-Waterman comparison of the query sequence against twenty 
homopolymers with infinite gap penalties. The output of the algorithm is 
both the masked query sequence for further analysis, e.g. database 
searches, as well as the regions of low complexity. The detection of 
low-complexity regions is highly specific for single residue types. It is 
shown that this approach is sufficient for masking database query 
sequences without generating false positives. The algorithm is 
benchmarked against widely available algorithms using the 210 genes of 
Plasmodium falciparum chromosome 2, a dataset known to contain a large 
number of low-complexity regions. Availability: CAST (version 1.0) 
executable binaries are available to academic users free of charge under 
license. Web site entry point, server and additional material: 
http ; //www. ebi . ac . uk/ research/ egg/ services/cast/ . Contact: 
ouzounis @ebi .ac.uk . 
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AB Sequence dateibase searches, using iterative-profile and 

Hidden-Markov-model approaches, were used to detect hitherto-undetected 
homologues of proteins that regulate the endoplasmic reticulum 
(ER) -associated degradation pathway. The translocon-associated subunit 
Sec63p (Sec=secretory) was shown to contain a domain of unknown function 
found twice in several Brr2p-like RNA helicases (Brr2=bad response to 
refrigeration 2) . Additionally, Cuelp (Cue=coupling of ubiquitin 
conjugation to ER degradation) , a yeast protein that recruits the 
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ubiquitin-con j ugating (UBC) enzyme Ubc7p to an ER-associated complex, was 
found to be one of a large family of putative scaffolding-domain- 
containing proteins that include the autocrine motility factor receptor 
and fungal Vps9p (Vps=vacuolar protein sorting) . Two other yeast 
translocon-associated molecules, Sec72p and Hrd3p (Hrd=3-hydroxy-3- 
methylglutaryl-CoA reductase degradation) , were shown to contain multiple 
tetratricopeptide-repeat-like sequences. From this observation it is 
suggested that Sec72p associates with a heat-shock protein, Hsp70, in a 
manner analogous to that known for Hop (Hsp70/Hsp90 organizing protein) . 
Finally, the luminal portion of Irelp (Ire=high inositol-requiring) , 
thought to convey the sensing function of this transmembrane kinase and 
endoribonuclease, was shown to contain repeats similar to those in 
beta-propeller proteins . This finding hints at the mechanism by which 
Irelp may sense extended unfolded proteins at the expense of compact 
folded molecules . 
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AB MOTIVATION: For large-scale structural assignment to sequences, as in 
computational structural genomics, a fast yet sensitive sequence 
search procedure is essential. A new approach using intermediate 
sequences was tested as a shortcut to iterative multiple sequence 
search methods such as PSI-BLAST. RESULTS: A library containing 
potential intermediate sequences for proteins of known structure 
(PDB-ISL) was constructed. The sequences in the library were collected 
from a large sequence database using the sequences of the domains of 
proteins of known structure as the query sequences and the program 
PSI-BLAST. Sequences of proteins of unknown structure can be matched to 
distantly related proteins of known structure by using pairwise sequence 
comparison methods to find homologues in PDB-ISL. Searches of PDB-ISL 
were calibrated, and the number of correct matches found at a given error 
rate was the same as that found by PSI-BLAST. The advantage of this 
library is that it uses pairwise sequence comparison methods, such as 
FASTA or BLAST2, and can, therefore, be searched easily and, in many 
cases, much more quickly than an iterative multiple sequence 
comparison method. The procedure is roughly 20 times faster than 
PSI-BLAST for small genomes and several hundred times for large genomes. 
AVAILABILITY: Sequences can be submitted to the PDB-ISL servers at 
http://stash.mrc-lmb. cam. ac . uk/PDB ISL/ or http : //cyrah . ebi . ac . uk: 1111/Ser 
v/PDB ISL/ and can be downloaded from ftp : //ftp . ebi . ac . uk/pub/ contrib/ j ong 
/PDB + ++ISL/ CONTACT: sattorc-lmb . cam. ac . uk and j ongQebi . ac . uk 
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AB Using a number of diverse protein families as test cases, we investigate 
the ability of the recently developed iterative sequence database 
search method, PSI-BLAST, to identify subtle relationships between 
proteins that originally have been deemed detectable only at the level of 
structure-structure comparison. We show that PSI-BLAST can detect many, 
though not all, of such relationships, but the success critically depends 
on the optimal choice of the query sequence used to initiate the 
search. Generally, there is a correlation between the diversity of the 
sequences detected in the first pass of database screening and the 
ability of a given query to detect subtle relationships in subsequent 
iterations. Accordingly, a thorough analysis of protein superf amilies at 
the sequence level is necessary in order to maximize the chances of 
gleaning non-trivial structural and functional inferences, as opposed to a 
single search, initiated, for example, with the sequence of a protein 
whose structure is available. This strategy is illustrated by several 
findings, each of which involves an unexpected structural prediction: (i) 
a number of previously undetected proteins with the HSP7 0-actin fold are 
identified, including a highly conserved and nearly ubiquitous family of 
metal-dependent proteases (typified by bacterial 0-sialoglycoprotease) 
that represent an adaptation of this fold to a new type of enzymatic 
activity; (ii) we show that, contrary to the previous conclusions, 
ATP-dependent and NAD-dependent DNA ligases are confidently predicted to 
possess the same fold; (iii) the C-terminal domain of 3-phosphoglycerate 
dehydrogenase, which binds serine and is involved in allosteric regulation 
of the enzyme activity, is shown to typify a new superfamily of 
ligand-binding, regulatory domains found primarily in enzymes and 
regulators of amino acid and purine metabolism; (iv) the 
immunoglobulin-like DNA-binding domain previously identified in the 
structures of transcription factors NFkappaB and NEAT is shown to be a 
member of a distinct superfamily of intracellular and extracellular 
domains with the immunoglobulin fold; and (v) the Rag-2 subunit of the 
V-D-J recombinase is shown to contain a kelch-type beta-propeller domain 
which rules out its evolutionary relationship with bacterial transposases . 
Copyright 1999 Academic Press. 
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AB All the detectable metallo-beta-lactamase fold proteins were identified in 
the publicly available sequence databases and complete genome 
sequences using iterative profile searches with the PSI-BLAST 
program and motif searches with position specific weight matrices. 
The catalytic site/mechanism and the corresponding structural elements 
were characterized for these proteins based on the available structure of 
the Bacillus zinc-dependent beta-lactamase . Based on pair-wise sequence 
and phylogenetic analysis an evolutionary classification for enzymes of 
this fold was developed and discussed in terms of implications for 
substrate specificity. Finally, some predicted inactive members which 
have been recruited for non-enzymatic functions such as microtubule 
binding in a cytoskeletal MAPI are described. 
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AB We have developed an algorithm (MassDynSearch) for identifying proteins 
using a combination of peptide masses with small associated sequences 
(tags). Unlike the approach developed by Matthias Mann, 'Tag searching', 
in which the sequence tags are generated by gas phase fragmentation of 
peptides in a mass spectrometer, 'Rag Tag' searching uses peptide tags 
which are generated enzymatically or chemically. The protein is digested 
either chemically or with an endopeptidase and the resultant mixture is 
then subjected to partial exopeptidase degradation. The mixture is 
analyzed by matrix assisted laser desorption and ionization time of flight 
mass spectrometry and a list of intact peptide masses is generated, each 
associated with a set of degradation product masses which serve as unique 
tags. These 'tagged masses' are used as the input to an algorithm we have 
written, MassDynSearch, which searches protein and DNA databases for 
proteins which contain similar tagged motifs. The method is simple, rapid 
and can be fully automated. The main advantage of this approach is that 
the specificity of the initial digestion is unimportant since multiple 
peptides with tags are used to search the database. This is 
especially useful for proteins like membrane, cytoskeletal, and other 
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proteins where specific endopeptidases are less efficient and lower 
specificity proteases such as chymotrypsin, pepsin, and elastase must be 
used. 
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AU Graul R C; Sadee W 

CS Department of Biopharmaceutical Sciences^ University of California San 

Francisco 94143-0446, USA. 
NC DA04166 (NIDA) 

GM37188 (NIGMS) 

GM43102 (NIGMS) 
SO Pharmaceutical research, (1997 Nov) 14 (11) 1533-41. 

Journal code: 8406521. ISSN: 0724-8741. 
CY United States 

DT Journal; Article; (JOURNAL ARTICLE) 

LA English 

FS Priority Journals 

EM 199802 

ED Entered STN: 19980226 

Last Updated on STN: 19980226 
Entered Medline: 19980219 

AB PURPOSE: Searching the existing datcibases for homologous sequences is 
essential to understanding a protein's structure and function. For a 
query sequence, its nearest neighbors can be identified by BLAST (basic 
local alignment search tool) . However, a single query sequence is 
sufficient to define the entire neighborhood of related sequences, and 
multiple BLAST queries are needed. We describe here a program which 
permits automated and iterative BLAST analysis of an entire neighborhood 
of sequences and apply this to search for homologs of the 

bacteriorhodopsins outside the archaea phylum. METHODS: We have developed 
a Java program, 'Iterative Neighborhood Cluster TVnalysis ' (INCA), 
which performs iterative BLAST searches, beginning with a single 
starter sequence, and proceeding with any other sequence achieving a 
predefined minimum alignment score. This results in a cluster of 
sequences where each sequence is related to at least one other 
sequence by the cutoff score, additional lists of more distantly related 
sequences for each member of cluster. RESULTS: Bacteriorhodopsins had 
not been previously aligned with any other protein family with scores 
indicative of probable homology. Using INCA, we identified a probable 
homolog in yeast, YR02_YEAST, also containing seven putative transmembrane 
domains. A finding of probable homology was supported by additional 
alignment strategies. CONCLUSIONS: INCA is a useful tool to assess 
complete protein neighborhoods. With an increasing database, INCA can 
serve to detect the emergence of evolutionary links between even the most 
distantly related protein families. Identifying a homolog of the 
bacteriorhodopsins in yeast illustrates this approach but at the same time 
highlights the vast evolutionary distances between polytopic membrane 
proteins, such as the bacteriorhodopsins. 
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TI Cloning and sequence analysis of the candidate nicotinic acetylcholine 
receptor alpha subunit gene tar-1 from Trichostrongylus colubrif ormis . 
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AB AT. colubrif ormis genomic library in lambda EMBL3 was screened for 
sequences homologous to the Caenorhabditis elegans unc-38 nicotinic 
acetylcholine receptor (nAChR) alpha-subunit gene. The candidate gene 
tar-1 (for Trichostrongylus acetylcholine receptor subunit gene 1) 
comprising 13704 base pairs was thus identified. BLAST comparison of the 
sequenced clone with GenBank, followed by comparison of translated regions 
in six reading frames with protein databases, identified clearly defined 
tracts corresponding to 12 putative exons sharing high sequence homology 
to other nAChR genes and able to code for sequential regions of a 
putative nAChR alpha-subunit protein (tar-1) . Tar-1 shares sequence 
similarities with over 40 nAChR subunit proteins. The highest similarity 
(91.6%) is with unc-38, suggesting that nAChR sequences from nematodes 
are closely related. The sequence includes motifs typical of these 
molecules including adjacent cysteine residues at the ACh binding site and 
four transmembrane regions. The DNA sequence presents the longest 
genomic tract described for this organism and should prove useful as a 
probe source in the search for nAChR genes from this and other nematodes 
and for studying the molecular mechanism of resistance to levamisole, a 
drug which is known to act on nAChRs of worms and which is widely used for 
parasite control. 
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AB A tool for searching pattern and fingerprint databases is described. 
Fingerprints are groups of motifs excised from conserved regions of 
sequence alignments and used for iterative database scanning. The 
constituent motifs are thus encoded as small alignments in which 
sequence information is maximised with each database pass; they 
therefore differ from regular-expression patterns, in which alignments are 
reduced to single consensus sequences. Different database formats 
have evolved to store these disparate types of information, namely the 
PROSITE dictionary of patterns and the PRINTS fingerprint database, but 
programs have not been available with the flexibility to search them 
both. We have developed a facility to do this: the system allows query 
sequences to be scanned against either PROSITE, the full PRINTS 
database, or against individual fingerprints. The results of 
fingerprint searches are displayed simultaneously in both text and 
graphical windows to render them more tangible to the user. Where 
structural coordinates are available, identified motifs may be visualised 
in a 3D context. The program runs on Silicon Graphics machines using GL 
graphics libraries and on machines with X servers supporting the PEX 
extension: its use is illustrated here by depicting the location of 
low-density lipoprotein-binding (LDL) motifs and leucine-rich repeats in a 
mosaic G-protein-coupled receptor (GPCR) . 
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AB Currently available sequence alignment programs are generally not 

capable of detecting functional and structural homologs in the twilight 
zone of sequence similarity, i.e. when the sequence identity falls 
below about 25%. Here we attempt to detect such weak similarities using 
an approach based on a notion of protein sequence similarity radically 
different from that used in sequential alignment. The approach defines 



STN Columbus 



protein sec[uence dissimilarity (or distance) as a weighted sum of 
differences of compositional properties such as singlet and doublet amino 
acid composition, molecular weight, isoelectric point (protein property 
search or PropSearch) . With PropSearch, either single sequences can 
be used for a database query, or multiple sequences can be merged into 
an "average" sequence reflecting the average composition of a protein 
family. First, we show that members of structural protein families have a 
low mutual PropSearch distance when the weights are optimized to 
discriminate maximally between structural families. Second, we 
demonstrate the results of database searches using the PropSearch 
method. Such searches are very rapid when scanning a preprocessed 
database and do not require alignments. In cases in which conventional 
alignment tools fail to detect similarities, PropSearch can be used to 
generate hypotheses about possible structural or functional relationships 
between a new sequence and sequences in the database. 
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AB Predicting the structural fold of a protein is an important and 

challenging problem. Available computer programs for determining 
whether a protein sequence is compatible with a known 3-dimensional 
structure fall into 2 categories: (1) structure-based methods, in which 
structural features such as local conformation and solvent accessibility 
are encoded in a template, and (2) sequence-based methods, in which 
aligned sequences of a set of related proteins are encoded in a 
template. In both cases, the programs use a static template based on a 
predetermined set of proteins. Here, we describe a computer-based method, 
called iterative template refinement (ITR), that uses templates 
combining structure-based and sequence-based information and employs an 
iterative search procedure to detect related proteins and sequentially 
add them to the templates. Starting from a single protein of known 
structure, ITR performs sequential cycles of database search to 
construct an expanding tree of templates with the aim of identifying 
subtle relationships among proteins. Evaluating the performance of ITR on 
6 proteins, we found that the method automatically identified a variety of 
subtle structural similarities to other proteins. For example, the method 
identified structural similarity between arabinose-binding protein and 
phosphofructokinase, a relationship that has not been widely recognized. 
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AB We have implemented a parallel version of a dynamic programming biological 
sequence comparison algorithm to study the potential applicability of 
using parallel computers for genetic sequence comparisons. Our parallel 
program is built using C-Linda, a machine-independent parallel 
programming language, and was tested on both a 10 CPU Sequent Symmetry and 
a 64 CPU Intel Hypercube. C~Linda implements a shared associative memory 
model, "tuple space," through which multiple processes can communicate and 
coordinate control. In our master-worker (MW) parallel implementation, a 
master process creates several worker processes, extracts a test 
sequence and multiple library sequences from a database and stores 
them in tuple space. Each worker reads the test sequence and then 
repeatedly extracts library strings from tuple space, performs pairwise 
sequence comparison using a local comparison algorithm to generate a 
similarity score, and returns the similarity scores to tuple space. The 
master collects the scores from tuple space and identifies the best match 
over all library sequences. We also implemented a method of global 
interworker communication to reduce the total search time by stopping 
those string comparisons that had no chance of improving on the current 
best match. Comparisons of the total run time, speedup, and efficiency 
were made for parallel and sequential versions of a basic MW 
implementation as well as versions with the global abort threshold. 
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